Performance Inconsistency in Large Scale Data Processing Clusters
نویسندگان
چکیده
A large shared computing platform is usually divided into several virtual clusters of fixed sizes, and each virtual cluster is used by a team. A cluster scheduler dynamically allocates physical servers to the virtual clusters depending on their sizes and current job demands. In this paper, we show that current cluster schedulers, which optimize for instantaneous fairness, cause performance inconsistency among the virtual clusters: Virtual clusters with similar loads see very different performance characteristics. We identify this problem by studying a production trace obtained from a large cluster and performing a simulation study. Our results demonstrate that when using an instantaneous-fairness scheduler, a large VC that contributes more resources during underload periods can not be properly rewarded during its overload periods. These results suggest that not using resource sharing history is the root cause for the performance inconsistency.
منابع مشابه
Using data envelopment analysis (DEA) to improve the sales performance in Iranian agricultural clusters by utilizing business networks and business development services providers (BDSPs)
Business clusters play an important role in developing and improving the economic performance of countries and in promoting the welfare of people. Business development service providers (hereafter referred to as, BDSP) have a considerable role in providing specialized services pertinent to the conditions of active enterprises in clusters and in promoting their performance level in order to impr...
متن کاملDetermination of Cluster Hydrodynamics in Bubbling Fluidized Beds by the EMMS Approach
The local solid flow structure of gas-solid bubbling fluidized bed was investigated to identify and characterize the particle clusters. Extensive mathematical calculations were carried out using the energy-minimization multi-scale (EMMS) approach for evaluating cluster properties including the velocity, the size and the void fraction of clusters in the dense phase of the bed. The results showed...
متن کاملخوشهبندی دادهها بر پایه شناسایی کلید
Clustering has been one of the main building blocks in the fields of machine learning and computer vision. Given a pair-wise distance measure, it is challenging to find a proper way to identify a subset of representative exemplars and its associated cluster structures. Recent trend on big data analysis poses a more demanding requirement on new clustering algorithm to be both scalable and accura...
متن کاملData-Replicas Scheduler for Heterogeneous MapReduce Cluster
Large scale data processing has rapidly increased in nowadays. MapReduce programming model, which is firstly mentioned in functional languages, appeared in distributed system and perform excellently in large scale data processing since 2006. Hadoop, which is the most popular framework of open-sourced MapReduce runtime environment, supplies reliable, scalable and distributed system processing la...
متن کاملHopper: Decentralized Speculation-aware Cluster Scheduling at Scale – Public Review
The huge volume of data available today has led to interest in parallel processing on commodity clusters. Data analytics distributed frameworks such as Hadoop, Spark, or Pregel are designed for parallel processing of a large amount of data. These frameworks break a computation job into small tasks that run in parallel on multiple machines, and aim to scale to very large clusters of inexpensive ...
متن کامل